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Michael  N.  Katehakis  ,  Lynn  Kuo  and,  H.  Robbins 

The  main  results  obtained  during  the  period  S/84  to  6/86  are  on  the 
following: 

I)  Sequential  Allocation  Problems. 

We  considered  the  general,  discrete  time,  effort  allocation  problem  known 
as  the  Multi-Armed  Bandit  problem  .  This  class  of  problems  was  first 
formulated  by  Robbins  (1952),  and  it  is  an  important  sequential  control 
problem  with  a  tractable  solution.  A  simple  version  of  it  can  be  stated  as 
follows.  There  are  N  independent  projects  (e.g.,  statistical  populations, 
manufacturing  machines,  maintenance  actions,  etc.).  The  state  of  the  1-th 
project  at  time  t  is  denoted  by  xi(t)  and  it  belongs  in  a  set  of  states  Si 
(which  in  the  simplest  case  is  a  countable  aet).  At  each  point  of  time  t  = 
0,1,...  one  can  work  on  one  project  only  and  if  the  i-th  of  them  is  selected 
receives  a  reward  R(t)  =  ri(x{(t))  and  its  state  changes  according  to  a  known 
Markovian  transition  rule  Pi(x{(t))  (Le.,  the  probabilities  P(x|(t+1)  =  y  | 

x|(t)  =  x)  are  known)  while  the  states  of  all  other  projects  remain  unchanged. 
The  states  of  all  projects  are  observable  and  the  objective  is  to  determine  a 
dynamic  effort  allocation  rule  w  so  as  to  minimise  the  expected  total 
discounted  reward  Kff(  |*l(t)  |  x(0))  ,  for  some  discount  factor  |  in 
(0,1)  . 

^Department  of  Applied  Mathematics  and  SUNT  at  Stony  Brook, 

^Department  of  Mathematical  Statistics,  Colunbia  University. 


Gittins  and  Jonas  (1974)  (c.f.  Gittins  (1979),  Whittla  (1980))  showad  that  thia 
general  problam  can  ba  reduced  to  N  one  dimensional  problems.  Bach  of  the 
latter  problems  involves  a  single  project,  and  its  solution  is  the  dynamic 
allocation  index  value  for  the  current  state  of  the  project.  At  each  point  of 

time  an  optimal  policy  for  the  original  problem  is  such  that  it  allocates  effort 

to  the  project  with  the  largest  index  value  in  the  then  current  state.  If  the 
present  state  of  a  project  is  x  then  the  corresponding  value  of  the  index  is 
given  by  one  of  the  following,  equivalent,  expressions: 

(1)  s(x)  =  supT{[«(  fy(t)  |  x(0))]/[l  -  1(D)  , 

(2)  a(x)  *  inf{M  :  supT(I(  1*1(4)  +  *T  |  x(0)))  *  N  )  , 

where,  in  (1)  ,  and  (2),  T  is  a  stopping  time  associated  with  the  process  that 
describes  the  evolution  of  the  state  of  the  project  under  consideration. 

It  is  a  difficult  task  to  caapute  the  indices  via  relations  (1)  and  (2). 

Subsequently,  Katehaids  sad  Veinott  (IMS)  obtained  the  following 
characterisation  for  the  index: 

(3)  a(x)  »  supff{[I<  9*1(4)  I  x(0))]  . 

where,  in  (3)  a  ,  is  a  return  time  to  state  x(0)  .  In  (3)  m(x)  is  the  value 
of  the  problem  in  which  at  any  point  in  time  we  have  to  decide  whether  to 
continue  using  the  project  or  to  return  it  to  its  initial  state  (at  Usm  a  )  and 
start  all  over  again  . 

Characterisation  (3)  reduces  the  problem  of  computing  a(x)  into  a 
standard  and  easy  problem  of  dynamic  programming. 

Also,  in  Katehakls  and  Veinott  (1988)  a  simpler  proof  of  the  original 
theorem  of  Gittins  and  Jones  (and  that  of  Whittle)  was  given. 

In  Katehakls  and  Derman  (1985c)  the  characterisation  (3)  is  used  to  compute 
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optimal  policies  in  ths  context  of  a  sequential  clinical  trials  problem. 

II)  Optimal  Maintenance  PnHriea 

We  consider  a  system  of  known  structure  that  is  composed  of  N  compo¬ 
nents  and  is  maintained  by  R  repairmen,  where  R  is  less  than  N  . 
Component  functioning  and  repair  times  are  random  variables  with  known 
distributions.  The  problem  is  to  characterise  dynamic  maintenance  policies; 
Le.,  rules  for  choosing  to  which  failed  components  repairmen  are  assigned, 
that  yield  a  maximum  value  to  a  system  measure  of  performance  such  as  the 
expected  discounted  system  operation  time  and  the  average  expected  system 
operation  tiaw. 

Under  appropriate  assumptions,  at  any  time  the  status  of  all  components  is 
given  by  a  vector  j  s  *tth  xj  s  1  or  0  if  the  i-th  component 

ia  functioning  or  failed.  Similarly,  the  state  of  the  system  is  given  by  the 
structure  function  f  ,  where  f<g)  =  1  or  0  if  the  system  is  functioning 
or  failed  when  component  status  is  i  . 

In  Katehakis  and  Derman  (1965a)  (see  also,  Katehakis  (1980))  we  considered 
systems  that  are  composed  of  highly  reliable  components.  We  extended  work 
done  ia  Smith  (1976)  by  providing  a  formulation  of  the  general  problem  along 
the  linee  of  Markovian  Decision  Theory.  Systems  composed  of  highly  reliable 
components  are  modeled  by  assuming  that  the  failure  rate  for  the  i-th 
component  is  of  the  form  ppj  ,  1  *  i  <  N  ,  for  some  scalar  p  >  0  .  Thus,  for 
small  values  of  p  all  components  are  highly  reliable.  Asymptotic  power 

series  expansions  of  the  expected  discounted  nonfunctioning  time  D*(gf9) 
are  obtained;  Le., 

#»<*.•>  *  lj,0  pV'W)  * 

where  |  denotes  the  discount  rate.  Por  small  values  of  p  optimal  policies 


were  determined  by  minimizing  the  leading  coefficients  of  the  above  power 
eerie  a.  It  waa  shown  that  there  exists  an  interval  (0,p*)  with  the  property 
that  if  it  contains  the  failure  rates  of  all  components,  then  the  asymptotically 
optimal  policy  under  consideration  is  optimal.  Recursive  formulas  for 
coaputiag  the  coefficients  Dx^(g,l)  Mere  obtained  sad  Mere  used  to  derive 
partial  characterisations  of  asymptotically  optimal  policies. 

Finally,  the  explicit  fora  of  aaypaptoticaHy  optimal  policies  for  systems  of 
specific  structure  such  as  tor  the  series-parallel  (for  R  t  2)  and  a  system 
composed  of  parallel  subsystems  connected  in  series  (for  R  =  1)  were  given. 

m)  Bmpirtoal  Bayes  aad  Prediction. 

Let  f(*lf)  be  a  given  parametric  family  of  probability  density  functions 
with  respect  to  some  o-finite  measure  |i  such  that 

(4)  J  xf(xlf)#(x)  *  •  for  all  •  . 

£et  (#,I,Y),  (•i.Xt.Yi),  1*1,2,...  ,  be  i.i.d.  random  vectors  such  that  •  ha 

sosm  (unknown)  distribution  function  Q  ,  while  conditionally  on  •  ,  X  and 
Y  are  independent  with  respective  probability  density  functions  f(x>9)  and 
f(ylX4)  ,  where  X  is  some  constant.  Finally,  let  u(x)  be  a  given  function 
dictated  by  practical  considerations. 

Let  us  consider  three  problems: 

Problem  I.  Bstiaate  the  rsndoa  quantity 
SB  »  1^  u(Xi)#i 

by  soma  function  of  Xi,...,Xn  • 

Problem  I.  Predict  the  random  quantity 

*n  *  I?  u<*i>Ti 

by  some  function  of  Xi,...,Xn  and  X  ,  when  X  is  known  (e.g.,  X  s  1). 
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Problem  III.  When  X  is  unknown  estimate  it  by  some  function  of 
Xl . Xn  and  Y*,...,yJ  ,  where  if  *  u(Xi)Yi  . 

One  way  of  solving  these  problems  is  to  try  to  find  a  function  v(x)  such 

that 

(5)  J  v(x)f(xlt)d|i(x)  *  0  J  u(x)f(xl0)d|i(x)  for  all  0  , 
and  then  to  define 

(6)  Vg  *  v(Xi)  . 

From  (5)  it  follow*  that  irrespective  of  the  the  nature  of  the  distribution 
function  G  of  0  , 

(7)  XSg  a  IVg  ,  is;  »  XXVg  . 

so  that  Sg  can  be  estimated  by  Vn  ,  by  XVn  whan  X  is  known,  and 

unkown  X  can  be  estimated  by  S^VQ  .  The  asymptotic  distributions  of  these 
estimators  have  been  obtained,  and  asymptotic  confidence  intervals  for  Sn  , 

,  and  X  have  been  found. 

We  have  found  solutions  of  the  basic  equation  (2)  for  many  of  the 
moot  common  parametric  families,  and  have  established  the  optimality  of  the 
corresponding  "u, v"  estimators  in  some  cases.  The  practical  importance  of 
these  results  is  indicated  la  the  proposal  for  future  work. 

An  experimenter  intends  to  test  the  strength  of  a  material  by  applying 
shocks  at  different  levels  to  groups  of  components.  The  response  to  a  shock 
is  assumed  to  be  dichotomous:  either  damaged  or  not.  We  observe  that  k  = 
(ki,...,kiJ  components  are  damaged  when  a  =  (nj,...,nL)  components  are 
tested  at  stress  levels  t  =  (ti,...,tL)  respectively  (tj  <  t2  <  ...  <  t^). 

The  tolerance  distribution  is  defined  by  P(t)  :=  probability  of  damage  at 
stress  level  t  . 


In  Kuo  L.  (1986)  a  no  n  paramo  trie  linear  Bayes  estimator  for  F  is 
developed ,  where  the  prior  is  assumed  to  be  Ferguson’  Dirichlet  process 
(1973);  we  have  studied  its  asymptotic  properties  and  have  given  numerical 


comparisons  for  boom  cases. 
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